AITopics | mlp module

Collaborating Authors

mlp module

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction

Jin, Guangyin, Lai, Sicong, Hao, Xiaoshuai, Zhang, Mingtao, Zhang, Jinlei

arXiv.org Artificial IntelligenceNov-13-2025

Achieving accurate traffic prediction is a fundamental but crucial task in the development of current intelligent transportation systems.Most of the mainstream methods that have made breakthroughs in traffic prediction rely on spatio-temporal graph neural networks, spatio-temporal attention mechanisms, etc. The main challenges of the existing deep learning approaches are that they either depend on a complete traffic network structure or require intricate model designs to capture complex spatio-temporal dependencies. These limitations pose significant challenges for the efficient deployment and operation of deep learning models on large-scale datasets. To address these challenges, we propose a cost-effective graph-free Multilayer Perceptron (MLP) based model M3-Net for traffic prediction. Our proposed model not only employs time series and spatio-temporal embeddings for efficient feature processing but also first introduces a novel MLP-Mixer architecture with a mixture of experts (MoE) mechanism. Extensive experiments conducted on multiple real datasets demonstrate the superiority of the proposed model in terms of prediction performance and lightweight deployment.Our code is available at https://github.com/jinguangyin/M3_NET

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2508.08543

Country:

Asia (0.29)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Transportation > Infrastructure & Services (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs

Liu, Jiahao, Wang, Zijian, Zhao, Kuo, Hu, Dong

arXiv.org Artificial IntelligenceNov-3-2025

Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs), typically achieved by first locating key knowledge-storage modules and then modifying their parameters. However, most existing methods focus exclusively on updating the weights of Multi-Layer Perceptron (MLP) modules, which are commonly identified as the primary repositories of factual information. Other important components, such as attention (Attn) modules--one of the core modules in LLMs--are often ignored during editing. This biased allocation of updates can leave residual outdated knowledge in the model and limit the effectiveness of knowledge editing. In this paper, we conduct comprehensive and systematic knowledge localization experiments on advanced LLMs, revealing that Attn modules play a substantial role in factual knowledge storage and retrieval, especially in earlier layers. Building on these insights, we propose IntAttn-Edit, a novel method that extends the associative memory paradigm to jointly update both MLP and Attn modules. Our approach employs a knowledge balancing strategy that proportionally allocates update magnitudes based on each module's measured contribution to knowledge storage. Extensive experiments on popular benchmarks demonstrate that IntAttn-Edit consistently achieves superior results over existing methods, delivering higher edit success, improved generalization, and robust knowledge preservation. Further empirical analysis shows that our knowledge balancing strategy enables the editing performance to remain within the optimal range across different settings.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.274

Genre: Research Report > Promising Solution (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training-free Truthfulness Detection via Value Vectors in LLMs

Liu, Runheng, Huang, Heyan, Xiao, Xingchen, Wu, Zhijing

arXiv.org Artificial IntelligenceSep-23-2025

Large language models often generate factually incorrect outputs, motivating efforts to detect the truthfulness of their content. Most existing approaches rely on training probes over internal activations, but these methods suffer from scalability and generalization issues. A recent training-free method, NoVo, addresses this challenge by exploiting statistical patterns from the model itself. However, it focuses exclusively on attention mechanisms, potentially overlooking the MLP module-a core component of Transformer models known to support factual recall. In this paper, we show that certain value vectors within MLP modules exhibit truthfulness-related statistical patterns. Building on this insight, we propose TruthV, a simple and interpretable training-free method that detects content truthfulness by leveraging these value vectors. On the NoVo benchmark, TruthV significantly outperforms both NoVo and log-likelihood baselines, demonstrating that MLP modules-despite being neglected in prior training-free efforts-encode rich and useful signals for truthfulness detection. These findings offer new insights into how truthfulness is internally represented in LLMs and motivate further research on scalable and interpretable truthfulness detection.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.17932

Country:

Europe (0.68)
Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms

Choe, Minyeong, Cho, Haehyun, Seo, Changho, Kim, Hyunil

arXiv.org Artificial IntelligenceSep-11-2025

Understanding how Transformer-based language models store and retrieve factual associations is critical for improving interpretability and enabling targeted model editing. Prior work, primarily on GPT-style models, has identified MLP modules in early layers as key contributors to factual recall. However, it remains unclear whether these findings generalize across different autoregressive architectures. To address this, we conduct a comprehensive evaluation of factual recall across several models -- including GPT, LLaMA, Qwen, and DeepSeek -- analyzing where and how factual information is encoded and accessed. Consequently, we find that Qwen-based models behave differently from previous patterns: attention modules in the earliest layers contribute more to factual recall than MLP modules. Our findings suggest that even within the autoregressive Transformer family, architectural variations can lead to fundamentally different mechanisms of factual recall.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.08778

Country:

Europe (0.28)
North America (0.28)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 16:55:50 GMT

computational linguistic, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(9 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Augmented Shortcuts for Vision Transformers (Supplementary Material) Y ehui T ang

Neural Information Processing SystemsAug-15-2025, 12:22:57 GMT

Following that original shortcut connections exist in both MSA and MLP modules, the proposed augmented shortcuts are also embedded into the MLP module (Eq. 10 in the main paper). S.4), the diversity in a block of Aug-ViT model is: Deep neural networks with trainable activations and controlled lipschitz constant.

diversity, mlp module, module, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models

Zhu, Hourun, Shen, Chengchao

arXiv.org Artificial IntelligenceJun-16-2025

In spite of strong performance achieved by LLMs, the costs of their deployment are unaffordable. For the compression of LLMs, gradient-based pruning methods present promising effectiveness. However, in these methods, the gradient computation with one-hot labels ignore the potential predictions on other words, thus missing key information for generative capability of the original model. To address this issue, we introduce a self-distillation loss during the pruning phase (rather than post-training) to fully exploit the predictions of the original model, thereby obtaining more accurate gradient information for pruning. Moreover, we find that, compared to attention modules, the predictions of LLM are less sensitive to multilayer perceptron (MLP) modules, which take up more than $5 \times$ parameters (LLaMA3.2-1.2B). To this end, we focus on the pruning of MLP modules, to significantly compress LLM without obvious performance degradation. Experimental results on extensive zero-shot benchmarks demonstrate that our method significantly outperforms existing pruning methods. Furthermore, our method achieves very competitive performance among 1B-scale open source LLMs. The source code and trained weights are available at https://github.com/visresearch/SDMPrune.

large language model, machine learning, pruning, (20 more...)

arXiv.org Artificial Intelligence

2506.1112

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling

Zhao, Anhao, Ye, Fanghua, Fan, Yingqi, Tong, Junlong, Fei, Zhiwei, Su, Hui, Shen, Xiaoyu

arXiv.org Artificial IntelligenceJun-5-2025

Large language models (LLMs) achieve remarkable performance across tasks but incur substantial computational costs due to their deep, multi-layered architectures. Layer pruning has emerged as a strategy to alleviate these inefficiencies, but conventional static pruning methods overlook two critical dynamics inherent to LLM inference: (1) horizontal dynamics, where token-level heterogeneity demands context-aware pruning decisions, and (2) vertical dynamics, where the distinct functional roles of MLP and self-attention layers necessitate component-specific pruning policies. We introduce SkipGPT, a dynamic layer pruning framework designed to optimize computational resource allocation through two core innovations: (1) global token-aware routing to prioritize critical tokens, and (2) decoupled pruning policies for MLP and self-attention components. To mitigate training instability, we propose a two-stage optimization paradigm: first, a disentangled training phase that learns routing strategies via soft parameterization to avoid premature pruning decisions, followed by parameter-efficient LoRA fine-tuning to restore performance impacted by layer removal. Extensive experiments demonstrate that SkipGPT reduces over 40% of model parameters while matching or exceeding the performance of the original dense model across benchmarks. By harmonizing dynamic efficiency with preserved expressivity, SkipGPT advances the practical deployment of scalable, resource-aware LLMs. Our code is publicly available at: https://github.com/EIT-NLP/SkipGPT.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.04179

Country:

Asia > China (0.28)
North America (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge

Shen, Xuan, Ma, Weize, Zhou, Yufa, Tang, Enhao, Xie, Yanyue, Li, Zhengang, Gong, Yifan, Wang, Quanyi, Ding, Henghui, Wang, Yiwei, Wang, Yanzhi, Zhao, Pu, Lin, Jun, Gu, Jiuxiang

arXiv.org Artificial IntelligenceMay-22-2025

Auto-regressive (AR) models, initially successful in language generation, have recently shown promise in visual generation tasks due to their superior sampling efficiency. Unlike image generation, video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase. Our key observations are: (i) MLP modules in the decode phase dominate the inference latency, and (ii) there exists high temporal redundancy in MLP outputs of adjacent frames. In this paper, we propose the \textbf{FastCar} framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy. The Temporal Attention Score (TAS) is proposed to determine whether to apply the replay strategy (\textit{i.e.}, reusing cached MLP outputs from the previous frame to reduce redundant computations) with detailed theoretical analysis and justification. Also, we develop a hardware accelerator on FPGA with Dynamic Resource Scheduling (DRS) based on TAS to enable better resource utilization and faster inference. Experimental results demonstrate the effectiveness of our method, which outperforms traditional sparse attention approaches with more than 2.1x decoding speedup and higher energy efficiency on the edge. Furthermore, by combining FastCar and sparse attention, FastCar can boost the performance of sparse attention with alleviated drifting, demonstrating our unique advantages for high-resolution and long-duration video generation. Code: https://github.com/shawnricecake/fast-car

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.14709

Country:

North America > United States (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Numerical Pruning for Efficient Autoregressive Models

Shen, Xuan, Song, Zhao, Zhou, Yufa, Chen, Bo, Liu, Jing, Zhang, Ruiyi, Rossi, Ryan A., Tan, Hao, Yu, Tong, Chen, Xiang, Zhou, Yufan, Sun, Tong, Zhao, Pu, Wang, Yanzhi, Gu, Jiuxiang

arXiv.org Artificial IntelligenceDec-16-2024

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.12441

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Tennessee (0.04)
North America > United States > Pennsylvania (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback